Zero knowledge document comparison between mutually distrustful parties

ABSTRACT

Prior zero-knowledge protocols are used for exchanging secret keys, but not for comparing documents. The present invention provides a method of zero-knowledge document comparison between mutually distrustful parties by having each party exchange a set of random data and a shared hash function, applying the hash function to concatenations of the document and the sets of random data, and comparing the hashes.

TECHNICAL FIELD

[0001] The invention relates to the field of cryptography and moreparticularly to zero-knowledge methods for comparing documents betweentwo parties.

BACKGROUND ART

[0002] A zero-knowledge protocol, as in other types of interactiveproofs, is a protocol between two parties in which one party (theprover), tries to prove a fact to the other party (the verifier). Thefact is typically secret information such as a password or, incryptographic applications, the private key of a public key encryptionalgorithm. In zero-knowledge protocols, the prover can convince theverifier that he is in possession of the secret without revealing thesecret itself. In particular, zero-knowledge protocols are cryptographicprotocols in which: 1) the verifier cannot learn anything from theprotocol—no knowledge is transferred; 2) the prover cannot cheat theverifier and vice versa; and 3) the verifier cannot pretend to be theprover to any third party. Thus in a zero-knowledge protocol, the factor secret itself, or any other useful information, is not revealed tothe other party during the protocol, nor to any eavesdropper. TheFiat-Shamir protocol was the first practical zero-knowledgecryptographic protocol.

[0003] Hash functions are commonly used in cryptography. A one-way hashfunction is a function that takes a variable-length input string andconverts it into a fixed-length output string. An example of such a hashfunction is the SHA-1 function. It is impossible to determine the inputstring from the hashed string.

[0004] In some situations where A and B are two distrustful parties, itmay be necessary for the parties to learn whether two documents whichare possessed by the respective parties are the same or substantiallythe same. For example, B may claim to have a copy of A's secret documentand A's course of action may hinge on whether B's claim is true. Neitherparty however can disclose their respective document to the other inorder to verify B's claim without destroying their secrecy. While theuse of zero-knowledge protocols is known for exchanging secret keys ithas not been used for comparing documents.

[0005] There is a need therefore a strong zero-knowledge documentcomparison method between mutually distrustful parties.

DISCLOSURE OF INVENTION

[0006] The present invention therefore provides a method of securelycomparing a first document in possession of a first party and a seconddocument in possession of a second party, without revealing the contentsof the first document to the second party or the contents of the seconddocument to the first party, said method comprising the steps of:

[0007] i) said first and second parties each generating its own set ofrandom data;

[0008] ii) each party exchanging the set of random data and a sharedhash function with the other party;

[0009] iii) each party computing a first value consisting of the outputof the shared hash function where the input to the hash function is theconsecutive concatenation of the document in each party's possession,followed by that party's set of random data, followed by the otherparty's set of random data;

[0010] iv) each party computing a second value consisting of the outputof the shared hash function where the input to the hash function is theconsecutive concatenation of the document in each party's possession,followed by the other party's set of random data, followed by thatparty's set of random data;

[0011] v) each party sending its first value to the other party andreceiving the other party's first value; and

[0012] vi) each party comparing the other party's first value to itssecond value;

[0013] vii) each party concluding that if the values are the same, thenthe two documents are the same, but that otherwise the two documents aredifferent.

[0014] The invention further provides a computer program product and anarticle for carrying out the method.

BRIEF DESCRIPTION OF DRAWINGS

[0015] In drawings which disclose a preferred embodiment of theinvention:

[0016]FIG. 1 is a schematic illustration of a computer network accordingto the present invention; and

[0017]FIG. 2 is a flow chart illustrating the method of the invention.

BEST MODE(S) FOR CARRYING OUT THE INVENTION

[0018] With reference to FIG. 1, a communications link, such as acomputer network, is designated generally as 10. Parties A and B, whodistrust each other, communicate between their respective computers 12and 14, which have central processors and are capable of generatingrandom numbers, and comparing numbers. A possesses a document containinginformation, in electronic form or otherwise, referred to as document1.B possesses a document containing information, in electronic form orotherwise, referred to as document2. Parties A and/or B, would like totake some further action only if one or the other or both can be assuredthat they both have the same document. They may not care to know eachother's identity.

[0019] If the respective documents, document1 and document2, are notalready in the form of a bit string, they are scanned or otherwiseconverted to that format. Next, A sends B a collection of random bits,Ra, preferably incorporating a timestamp. B sends A a collection ofrandom bits, Rb, preferably incorporating a timestamp. A compares Ra toRb and aborts the comparison if they are the same, since the comparisonwill only work if the random numbers generated by A and B are different.Similarly B compares Rb to Ra and aborts the comparison if they are thesame. They will then restart and generate fresh random numbers if theywish to continue.

[0020] Once A and B have exchanged non-identical random strings Ra andRb, and have agreed on one-way hash functions H₁, H₂, A computesfirstValueA by concatenating document1 with Ra and Rb, in that order, toform a string document1+Ra+Rb, in that order and then applying to thatstring a one-way hash function H₁. Any suitable cryptographic one-wayhash function, such as the SHA-1 function, may be used. A then computessecondValueA by concatenating document1+Rb+Ra, in that order, andapplying to it one-way hash function H₂. Similarly B computesfirstValueB by concatenating document2 with Rb and Ra, in that order, toform a string document2+Rb+Ra, in that order, and then applying to thatstring one-way hash function H₂. B then computes secondValueB byconcatenating document2 with Ra and Rb, in that order, to form a stringdocument2+Ra+Rb, in that order, and then applying to that string aone-way hash function H₁. Hash functions H₁ and H₂ may be the same.

[0021] It has been agreed upon beforehand that A will transmit theencrypted string firstValueA first to B, although the method will workregardless of which party sends the encrypted string to the other first.Upon completion of the foregoing steps, A sends B a message indicatingthat it has computed firstValueA and secondValueA, and either before,after, or at the same time as A sends that message, B sends A a messageindicating that it has computed firstValueB and secondValueB. A thensends B firstValueA. B sends A firstValueB immediately upon receipt ofA's firstValueA. If A does not receive B's firstValueB within a fewmilliseconds (in the absence of some other explanation such as acommunication breakdown), A knows B did not have the same document andis trying to gain an advantage over A.

[0022] If A receives B's firstValueB in a timely way, A compares thereceived firstValueB with its own secondValueA. B also compares thereceived firstValueA with its own secondValueB. If the comparisons fail,then A and B know they statistically have different documents, and ifthe comparison does not fail, then statistically they have the samedocument. With that knowledge they may then proceed with their intendedactions, or not.

[0023] Such comparisons may allow for a certain statisticaldissimilarity in the strings or range of equivalence. A strictapplication of a hash function such as SHA-1 to a bit stream, such as adocument, will produce a value that is statistically impossible toproduce by supplying a second different meaningful bit stream. A strictapplication of the hash function does not allow for variance resultingfrom transmission errors or conversion between formats. Such varianceswould typically result in different hash codes. However, it is possibleto describe a process where minor variation in the source can behandled. A document may be normalized before being passed to a hashfunction, or a hash function could be constructed that handles thenormalization internally as part of the implementation. In this wayinconsequential differences in the documents such as case type andspacing can be ignored.

[0024] For example, the parties could agree that whitespace (such asspaces, tabs and carriage returns) and character case are insignificant.The document could then be converted to a normalized form where there isno whitespace and all the characters are lowercase. The other approachwould be to make the hash function ignore whitespace and changecharacters to lowercase before injection into the rest of the algorithm.

[0025] Thus it will be seen that according to this method, A and/or Bcannot prove anything to a third party without revealing documents. Aand B do not exchange the actual documents or hashed documents. Further,A or B cannot fool another party C into thinking it has the document bymirroring, resending or replaying the hash received from the other partyto the third party. B cannot assert computational delay as they havepreviously asserted a pre-computation.

[0026] The present invention is described above as acomputer-implemented method. It may also be embodied as a computerhardware apparatus, computer software code or a combination of same. Theinvention may also be embodied as a computer-readable storage mediumembodying code for implementing the invention. Such storage medium maybe magnetic or optical, hard or floppy disk, CD-ROM, firmware or otherstorage media. The invention may also be embodied on a computer readablemodulated carrier signal.

[0027] As will be apparent to those skilled in the art in the light ofthe foregoing disclosure, many alterations and modifications arepossible in the practice of this invention without departing from thespirit or scope thereof. Accordingly, the scope of the invention is tobe construed in accordance with the substance defined by the followingclaims.

[0028] The embodiments of the invention in which an exclusive propertyor privilege is claimed are defined as follows:

I claim:
 1. A method of securely comparing a first document inpossession of a first party and a second document in possession of asecond party, without revealing the contents of the first document tothe second party or the contents of the second document to the firstparty, said method comprising the steps of: i) said first and secondparties each generating its own set of random data; ii) each partyexchanging said set of random data and a shared hash function with theother party; iii) each party computing a first value consisting of theoutput of said shared hash function where the input to the hash functionis the consecutive concatenation of the document in each said party'spossession, followed by that party's set of random data, followed by theother party's set of random data; iv) each party computing a secondvalue consisting of the output of said shared hash function where theinput to the hash function is the consecutive concatenation of thedocument in each said party's possession, followed by the other party'sset of random data, followed by that party's set of random data; v) eachparty sending its first value to the other party and receiving the otherparty's first value; and vi) each party comparing said other party'sfirst value to its second value; vii) each party concluding that if thesaid values are the same, then the two documents are the same, but thatotherwise said two documents are different.
 2. The method according toclaim 1 further comprising the steps of: viii) after computing saidfirst and second values according to steps iii) and iv) above, each saidfirst and second parties sending confirmation to the other party thateach said party's first and second values have been computed, andwaiting for said confirmation from said other party that each saidparty's first and second values have been computed before proceeding;and ix) after one party has sent its first value to the other partyaccording to step v) above, aborting the comparison if the other partydoes not respond with its first value within a pre-determined length oftime.
 3. The method according to claim 2 further comprising the stepsof: x) after step i) and before step ii), each party examining the otherparty's set of random data for suitability and aborting the comparisonif suitability is not established.
 4. The method according to claim 3wherein said other party's random data is determined to be unsuitable ifit is identical to said examining party's set of random data.
 5. Themethod according to claim 1 wherein said parties exchange two sharedhash functions, a first hash function applied by said first party instep iii) and said second party in step iv) and a second hash functionapplied by said second party in step iii) and said first party in stepiv).
 6. The method according to claim 1 wherein said documents arenormalized prior to computation of said first and second values to allowthe method to ignore inconsequential differences between said documents.7. The method according to claim 1 wherein said hash function is adaptedto act on said documents in a normalized way to allow the method toignore inconsequential differences between said documents.
 8. A computerprogram product for securely comparing a first document in possession ofa first party and a second document in possession of a second party,without revealing the contents of the first document to the second partysaid computer program product comprising: a computer usable mediumhaving computer readable program code means embodied in said medium for:i) generating a set of random data for said first party; ii) exchangingsaid set of random data and a shared hash function with the other party;iii) computing a first value consisting of the output of said sharedhash function where the input to the hash function is the consecutiveconcatenation of the document in each said party's possession, followedby that party's set of random data, followed by the other party's set ofrandom data; iv) computing a second value consisting of the output ofsaid shared hash function where the input to the hash function is theconsecutive concatenation of the document in each said party'spossession, followed by the other party's set of random data, followedby that party's set of random data; v) sending said first value to theother party and receiving the other party's first value; and vi)comparing said other party's first value consisting of the output ofsaid shared hash function where the input to the hash function is theconsecutive concatenation of the document in said other party'spossession, followed by said set of random data, followed by the otherparty's set of random data, to its second value; vii) concluding that ifthe said values are the same, then the two documents are the same, butthat otherwise said two documents are different.
 9. The computer programproduct of claim 8 wherein said computer usable medium further hascomputer readable program code means embodied in said medium for: viii)after computing said first and second values according to iii) and iv)above, sending confirmation to the other party that the first and secondvalues have been computed, and waiting for confirmation from said otherparty that said other party's first and second values have been computedbefore proceeding; and ix) after sending its first value to the otherparty according to v) above, aborting the comparison if the other partydoes not respond with its first value within a pre-determined length oftime.
 10. The computer program product of claim 9 wherein said computerusable medium further has computer readable program code means embodiedin said medium for: x) after step i) and before step ii) examining theother party's set of random data for suitability and aborting thecomparison if suitability is not established.
 11. The computer programproduct of claim 10 wherein said other party's random data is determinedto be unsuitable if it is identical to said examining party's set ofrandom data.
 12. The computer program product of claim 8 wherein saidparties exchange two shared hash functions, a first hash functionapplied by said first party in step iii) and said second party in stepiv) and a second hash function applied by said second party in step iii)and said first party in step iv).
 13. The computer program product ofclaim 8 wherein said documents are normalized prior to computation ofsaid first and second values to allow the method to ignoreinconsequential differences between said documents.
 14. The computerprogram product of claim 8 wherein said hash function is adapted to acton said documents in a normalized way to allow the method to ignoreinconsequential differences between said documents.
 15. An articlecomprising: a computer readable modulated carrier signal; means embeddedin said signal for securely comparing a first document in possession ofa first party and a second document in possession of a second party,without revealing the contents of the first document to the second partyby: i) generating a set of random data for said first party; ii)exchanging said set of random data and a shared hash function with theother party; iii) computing a first value consisting of the output ofsaid shared hash function where the input to the hash function is theconsecutive concatenation of the document in each said party'spossession, followed by that party's set of random data, followed by theother party's set of random data; iv) computing a second valueconsisting of the output of said shared hash function where the input tothe hash function is the consecutive concatenation of the document ineach said party's possession, followed by the other party's set ofrandom data, followed by that party's set of random data; v) sendingsaid first value to the other party and receiving the other party'sfirst value; and vi) comparing said other party's first value consistingof the output of said shared hash function where the input to the hashfunction is the consecutive concatenation of the document in said otherparty's possession, followed by said set of random data, followed by theother party's set of random data, to its second value; vii) concludingthat if the said values are the same, then the two documents are thesame, but that otherwise said two documents are different.
 16. Thearticle of claim 15 wherein said signal further has means embodiedtherein for: viii) after computing said first and second valuesaccording to iii) and iv) above, sending confirmation to the other partythat each said party's first and second values have been computed, andwaiting for said confirmation from said other party that said party'sfirst and second values have been computed before proceeding; and ix)after sending its first value to the other party according to v) above,aborting the comparison if the other party does not respond with itsfirst value within a pre-determined length of time.
 17. The article ofclaim 16 wherein said signal further has means embodied therein for x)after step i) and before step ii) examining the other party's set ofrandom data for suitability and aborting the comparison if suitabilityis not established.
 18. The article of claim 17 wherein said otherparty's random data is determined to be unsuitable if it is identical tosaid examining party's set of random data.
 19. The article of claim 15wherein said parties exchange two shared hash functions, a first hashfunction applied by said first party in step iii) and said second partyin step iv) and a second hash function applied by said second party instep iii) and said first party in step iv).
 20. The article of claim 15wherein said documents are normalized prior to computation of said firstand second values to allow the method to ignore inconsequentialdifferences between said documents.
 21. The article of claim 15 whereinsaid hash function is adapted to act on said documents in a normalizedway to allow the method to ignore inconsequential differences betweensaid documents.